user state
Adaptive XAI in High Stakes Environments: Modeling Swift Trust with Multimodal Feedback in Human AI Teams
Fernando, Nishani, Nakisa, Bahareh, Ahmad, Adnan, Rastgoo, Mohammad Naim
Effective human-AI teaming heavily depends on swift trust, particularly in high-stakes scenarios such as emergency response, where timely and accurate decision-making is critical. In these time-sensitive and cognitively demanding settings, adaptive explainability is essential for fostering trust between human operators and AI systems. However, existing explainable AI (XAI) approaches typically offer uniform explanations and rely heavily on explicit feedback mechanisms, which are often impractical in such high-pressure scenarios. To address this gap, we propose a conceptual framework for adaptive XAI that operates non-intrusively by responding to users' real-time cognitive and emotional states through implicit feedback, thereby enhancing swift trust in high-stakes environments. The proposed adaptive explainability trust framework (AXTF) leverages physiological and behavioral signals, such as EEG, ECG, and eye tracking, to infer user states and support explanation adaptation. At its core is a multi-objective, personalized trust estimation model that maps workload, stress, and emotion to dynamic trust estimates. These estimates guide the modulation of explanation features enabling responsive and personalized support that promotes swift trust in human-AI collaboration. This conceptual framework establishes a foundation for developing adaptive, non-intrusive XAI systems tailored to the rigorous demands of high-pressure, time-sensitive environments.
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > California > Ventura County > Thousand Oaks (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
Evaluation of LLMs-based Hidden States as Author Representations for Psychological Human-Centered NLP Tasks
Soni, Nikita, Chitale, Pranav, Singh, Khushboo, Balasubramanian, Niranjan, Schwartz, H. Andrew
Like most of NLP, models for human-centered NLP tasks -- tasks attempting to assess author-level information -- predominantly use representations derived from hidden states of Transformer-based LLMs. However, what component of the LM is used for the representation varies widely. Moreover, there is a need for Human Language Models (HuLMs) that implicitly model the author and provide a user-level hidden state. Here, we systematically evaluate different ways of representing documents and users using different LM and HuLM architectures to predict task outcomes as both dynamically changing states and averaged trait-like user-level attributes of valence, arousal, empathy, and distress. We find that representing documents as an average of the token hidden states performs the best generally. Further, while a user-level hidden state itself is rarely the best representation, we find its inclusion in the model strengthens token or document embeddings used to derive document- and user-level representations resulting in best performances.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- (6 more...)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.68)
- Government (0.93)
- Information Technology > Security & Privacy (0.68)
When Online Algorithms Influence the Environment: A Dynamical Systems Analysis of the Unintended Consequences
Lankireddy, Prabhat, Nair, Jayakrishnan, Manjunath, D
We analyze the effect that online algorithms have on the environment that they are learning. As a motivation, consider recommendation systems that use online algorithms to learn optimal product recommendations based on user and product attributes. It is well known that the sequence of recommendations affects user preferences. However, typical learning algorithms treat the user attributes as static and disregard the impact of their recommendations on user preferences. Our interest is to analyze the effect of this mismatch between the model assumption of a static environment, and the reality of an evolving environment affected by the recommendations. To perform this analysis, we first introduce a model for a generic coupled evolution of the parameters that are being learned, and the environment that is affected by it. We then frame a linear bandit recommendation system (RS) into this generic model where the users are characterized by a state variable that evolves based on the sequence of recommendations. The learning algorithm of the RS does not explicitly account for this evolution and assumes that the users are static. A dynamical system model that captures the coupled evolution of the population state and the learning algorithm is described, and its equilibrium behavior is analyzed. We show that when the recommendation algorithm is able to learn the population preferences in the presence of this mismatch, the algorithm induces similarity in the preferences of the user population. In particular, we present results on how different properties of the recommendation algorithm, namely the user attribute space and the exploration-exploitation tradeoff, effect the population preferences when they are learned by the algorithm. We demonstrate these results using model simulations.
Algorithmic Content Selection and the Impact of User Disengagement
Calvano, Emilio, Haghtalab, Nika, Vitercik, Ellen, Zhao, Eric
The content selection problem of digital services is often modeled as a decision-process where a service chooses, over multiple rounds, an arm to pull from a set of arms that each return a certain reward. This classical model does not account for the possibility that users disengage when dissatisfied and thus fails to capture an important trade-off between choosing content that promotes future engagement versus immediate reward. In this work, we introduce a model for the content selection problem where dissatisfied users may disengage and where the content that maximizes immediate reward does not necessarily maximize the odds of future user engagement. We show that when the relationship between each arm's expected reward and effect on user satisfaction are linearly related, an optimal content selection policy can be computed efficiently with dynamic programming under natural assumptions about the complexity of the users' engagement patterns. Moreover, we show that in an online learning setting where users with unknown engagement patterns arrive, there is a variant of Hedge that attains a $\tfrac 12$-competitive ratio regret bound. We also use our model to identify key primitives that determine how digital services should weigh engagement against revenue. For example, when it is more difficult for users to rejoin a service they are disengaged from, digital services naturally see a reduced payoff but user engagement may -- counterintuitively -- increase.
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
Planning with Large Language Models for Conversational Agents
Li, Zhigen, Peng, Jianxiang, Wang, Yanmeng, Shen, Tianhao, Zhang, Minghui, Su, Linxi, Wu, Shang, Wu, Yihang, Wang, Yuqian, Wang, Ye, Hu, Wei, Li, Jianfeng, Wang, Shaojun, Xiao, Jing, Xiong, Deyi
Controllability and proactivity are crucial properties of autonomous conversational agents (CAs). Controllability requires the CAs to follow the standard operating procedures (SOPs), such as verifying identity before activating credit cards. Proactivity requires the CAs to guide the conversation towards the goal during user uncooperation, such as persuasive dialogue. Existing research cannot be unified with controllability, proactivity, and low manual annotation. To bridge this gap, we propose a new framework for planning-based conversational agents (PCA) powered by large language models (LLMs), which only requires humans to define tasks and goals for the LLMs. Before conversation, LLM plans the core and necessary SOP for dialogue offline. During the conversation, LLM plans the best action path online referring to the SOP, and generates responses to achieve process controllability. Subsequently, we propose a semi-automatic dialogue data creation framework and curate a high-quality dialogue dataset (PCA-D). Meanwhile, we develop multiple variants and evaluation metrics for PCA, e.g., planning with Monte Carlo Tree Search (PCA-M), which searches for the optimal dialogue action while satisfying SOP constraints and achieving the proactive of the dialogue. Experiment results show that LLMs finetuned on PCA-D can significantly improve the performance and generalize to unseen domains. PCA-M outperforms other CoT and ToT baselines in terms of conversation controllability, proactivity, task success rate, and overall logical coherence, and is applicable in industry dialogue scenarios. The dataset and codes are available at XXXX.
- Asia > China > Guangdong Province > Shenzhen (0.05)
- Europe > Italy (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
- (6 more...)
- Workflow (0.93)
- Research Report (0.84)
- Banking & Finance (1.00)
- Information Technology (0.94)
- Leisure & Entertainment > Games (0.46)
User Modeling Challenges in Interactive AI Assistant Systems
Interactive Artificial Intelligent(AI) assistant systems are designed to offer timely guidance to help human users to complete a variety tasks. One of the remaining challenges is to understand user's mental states during the task for more personalized guidance. In this work, we analyze users' mental states during task executions and investigate the capabilities and challenges for large language models to interpret user profiles for more personalized user guidance. In the digital age, there is immense potential for artificial intelligent (AI) assistant to guides users through complex tasks, from changing laptop batteries to piping frosting on a cake. One of the main challenges, however, lies in creating an interactive system that can not only understand which step the user is at, but can also detect user's mental states, such as frustration, familiarity with the task, detail-orientation, etc.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
- (2 more...)
Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning
Xu, Ruiyang, Bhandari, Jalaj, Korenkevych, Dmytro, Liu, Fan, He, Yuchen, Nikulkov, Alex, Zhu, Zheqing
Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on user behavior. In this study, we employ reinforcement learning to optimize for long-term return metrics in an auction-based recommender system. Utilizing temporal difference learning, a fundamental reinforcement learning algorithm, we implement an one-step policy improvement approach that biases the system towards recommendations with higher long-term user engagement metrics. This optimizes value over long horizons while maintaining compatibility with the auction framework. Our approach is grounded in dynamic programming ideas which show that our method provably improves upon the existing auction-based base policy. Through an online A/B test conducted on an auction-based recommender system which handles billions of impressions and users daily, we empirically establish that our proposed method outperforms the current production system in terms of long-term user engagement metrics.
- Information Technology > Services (0.66)
- Marketing (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Causal Inference for Chatting Handoff
Zhong, Shanshan, Qin, Jinghui, Huang, Zhongzhan, Li, Daifeng
Aiming to ensure chatbot quality by predicting chatbot failure and enabling human-agent collaboration, Machine-Human Chatting Handoff (MHCH) has attracted lots of attention from both industry and academia in recent years. However, most existing methods mainly focus on the dialogue context or assist with global satisfaction prediction based on multi-task learning, which ignore the grounded relationships among the causal variables, like the user state and labor cost. These variables are significantly associated with handoff decisions, resulting in prediction bias and cost increasement. Therefore, we propose Causal-Enhance Module (CEM) by establishing the causal graph of MHCH based on these two variables, which is a simple yet effective module and can be easy to plug into the existing MHCH methods. For the impact of users, we use the user state to correct the prediction bias according to the causal relationship of multi-task. For the labor cost, we train an auxiliary cost simulator to calculate unbiased labor cost through counterfactual learning so that a model becomes cost-aware. Extensive experiments conducted on four real-world benchmarks demonstrate the effectiveness of CEM in generally improving the performance of existing MHCH methods without any elaborated model crafting.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Slovakia > Bratislava > Bratislava (0.04)
Reward Shaping for User Satisfaction in a REINFORCE Recommender
Christakopoulou, Konstantina, Xu, Can, Zhang, Sai, Badam, Sriraj, Potter, Trevor, Li, Daniel, Wan, Hao, Yi, Xinyang, Le, Ya, Berg, Chris, Dixon, Eric Bencomo, Chi, Ed H., Chen, Minmin
How might we design Reinforcement Learning (RL)-based recommenders that encourage aligning user trajectories with the underlying user satisfaction? Three research questions are key: (1) measuring user satisfaction, (2) combatting sparsity of satisfaction signals, and (3) adapting the training of the recommender agent to maximize satisfaction. For measurement, it has been found that surveys explicitly asking users to rate their experience with consumed items can provide valuable orthogonal information to the engagement/interaction data, acting as a proxy to the underlying user satisfaction. For sparsity, i.e, only being able to observe how satisfied users are with a tiny fraction of user-item interactions, imputation models can be useful in predicting satisfaction level for all items users have consumed. For learning satisfying recommender policies, we postulate that reward shaping in RL recommender agents is powerful for driving satisfying user experiences. Putting everything together, we propose to jointly learn a policy network and a satisfaction imputation network: The role of the imputation network is to learn which actions are satisfying to the user; while the policy network, built on top of REINFORCE, decides which items to recommend, with the reward utilizing the imputed satisfaction. We use both offline analysis and live experiments in an industrial large-scale recommendation platform to demonstrate the promise of our approach for satisfying user experiences.
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Macao (0.04)
- Asia > China (0.04)
From Clicks to Conversions: Recommendation for long-term reward
Chagniot, Philomène, Vasile, Flavian, Rohde, David
A modern approach to recommendation will look at this log in order to improve future recommendations. By examining how similar users respond to different recommendations it becomes possible to discover better recommendations and continue to improve the system. This procedure of learning by experimentation in some respects mimics randomized control trials in medicine where populations are split into two and different treatments are delivered to similar groups. Medical trials are however simpler, as an intervention or a placebo is administered to each group and then long-term impacts are observed with no further interventions delivered. The challenges of credit attribution in the case of delayed reward and multiple actions. In contrast with medical trials, where the treatment is frequently a binary variable, recommender systems will deliver multiple actions at variable times leading to combinatorially complex treatments. For simplicity, in our previous work on RecoGym[2], we assumed that both the current recommendation and the reward are conditionally independent on past actions, therefore making the recommendation amenable to contextual bandits and supervised value modeling approaches.